Goto

Collaborating Authors

 active learning framework


An active learning framework for multi-group mean estimation

Neural Information Processing Systems

After observing a sample, the analyst may update their estimate of the mean and variance of that group and choose the next group accordingly. The analyst's objective is to dynamically collect samples to minimize the




An active learning framework for multi-group mean estimation

Neural Information Processing Systems

We consider a fundamental problem where there are multiple groups whose data distributions are unknown, and an analyst would like to learn the mean of each group. We consider an active learning framework to sequentially collect $T$ samples with bandit, each period observing a sample from a chosen group. After observing a sample, the analyst may update their estimate of the mean and variance of that group and choose the next group accordingly. The objective is to dynamically collect samples to minimize the $p$-norm of the vector of variances of our mean estimators after $T$ rounds. We propose an algorithm, Variance-UCB, that selects groups according to a an upper bound on the variance estimate adjusted to the $p$-norm chosen. We show that the regret of Variance-UCB is $O(T^{-2})$ for finite $p$, and prove that no algorithm can do better. When $p$ is infinite, we recover the $O(T^{-1.5})$


Active Learning for Machine Learning Driven Molecular Dynamics

Bachelor, Kevin, Murdeshwar, Sanya, Sabo, Daniel, Marinescu, Razvan

arXiv.org Artificial Intelligence

Machine-learned coarse-grained (CG) potentials are fast, but degrade over time when simulations reach under-sampled bio-molecular conformations, and generating widespread all-atom (AA) data to combat this is computationally infeasible. We propose a novel active learning (AL) framework for CG neural network potentials in molecular dynamics (MD). Building on the CGSchNet model, our method employs root mean squared deviation (RMSD)-based frame selection from MD simulations in order to generate data on-the-fly by querying an oracle during the training of a neural network potential. This framework preserves CG-level efficiency while correcting the model at precise, RMSD-identified coverage gaps. By training CGSchNet, a coarse-grained neural network potential, we empirically show that our framework explores previously unseen configurations and trains the model on unexplored regions of conformational space. Our active learning framework enables a CGSchNet model trained on the Chignolin protein to achieve a 33.05\% improvement in the Wasserstein-1 (W1) metric in Time-lagged Independent Component Analysis (TICA) space on an in-house benchmark suite.


Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes Science

Science

We introduced a perturbational single-cell RNA sequencing (scRNA-seq) dataset with 1.2 million cells spanning 88 perturbations across 10 primary and cancer cell lines. Using this dataset along with public perturbational omics data (held-out CMap and SciPlex signatures), we showed that DrugReflector robustly prioritizes compounds from transcriptional signatures even outside of its training context, consistently outperforming state-of-the-art approaches. Through two hematopoietic campaigns using single-cell atlas–defined cell state transitions as model inputs, we identified inducers of megakaryocyte and erythroid differentiation, achieving hit rates 10-fold higher than a random baseline. To assess generalizability, we additionally deployed DrugReflector in two distinct oncology indications, recovering clinical standards of care and modulators of known indication-specific pathways. To further characterize and leverage the transcriptional drivers of megakaryocyte induction, we created a time-course scRNA-seq dataset of hematopoietic stem and progenitor cells with paired flow cytometry readouts for a range of transcriptionally and phenotypically active compounds.


An active learning framework for multi-group mean estimation

Neural Information Processing Systems

After observing a sample, the analyst may update their estimate of the mean and variance of that group and choose the next group accordingly. The analyst's objective is to dynamically collect samples to minimize the




An Active Learning Framework using Sparse-Graph Codes for Sparse Polynomials and Graph Sketching

Xiao Li, Kannan Ramchandran

Neural Information Processing Systems

The goal is to learn the polynomial by querying the values of f . We introduce an active learning framework that is associated with a low query cost and computational runtime. The significant savings are enabled by leveraging sampling strategies based on modern coding theory, specifically, the design and analysis of sparse-graph codes, such as Low-Density-Parity-Check (LDPC) codes, which represent the state-of-the-art of modern packet communications. More significantly, we show how this design perspective leads to exciting, and to the best of our knowledge, largely unexplored intellectual connections between learning and coding. The key is to relax the worst-case assumption with an ensemble-average setting, where the polynomial is assumed to be drawn uniformly at random from the ensemble of all polynomials (of a given size n and sparsity s).